Skip to content

UCT/CUDA: Skip CUDA cleanup for destroyed contexts#11437

Merged
roiedanino merged 6 commits into
openucx:masterfrom
roiedanino:cuda/ctx-destroy-local
May 27, 2026
Merged

UCT/CUDA: Skip CUDA cleanup for destroyed contexts#11437
roiedanino merged 6 commits into
openucx:masterfrom
roiedanino:cuda/ctx-destroy-local

Conversation

@roiedanino

Copy link
Copy Markdown
Contributor

What?

Avoid calling cuStreamDestroy or cuEventDestroy on resources from a context that is no longer valid

Why?

CUDA may warn or crash in that path.

Applications can reset the CUDA primary context before MPI_Finalize. Avoid calling cuStreamDestroy or cuEventDestroy on resources from a context that is no longer valid, since CUDA may warn or crash in that path.

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino requested a review from rakhmets May 7, 2026 16:06
@roiedanino roiedanino self-assigned this May 7, 2026
Comment thread src/uct/cuda/base/cuda_iface.c Outdated
Comment thread src/uct/cuda/base/cuda_iface.c Outdated
Comment thread src/uct/cuda/base/cuda_iface.c Outdated
Signed-off-by: Roie Danino <rdanino@nvidia.com>
Signed-off-by: Roie Danino <rdanino@nvidia.com>
Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino requested a review from iyastreb May 12, 2026 11:03
Comment thread test/gtest/uct/cuda/test_cuda_ctx_cleanup.cc Outdated
Comment thread test/gtest/uct/cuda/test_cuda_ctx_cleanup.cc Outdated
…ther than complex hook installations affecting the entire gtest binary

Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino requested a review from iyastreb May 13, 2026 06:43
@roiedanino

Copy link
Copy Markdown
Contributor Author

Hi @rakhmets @iyastreb can re-review please?

@yosefe yosefe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one issue.

Comment thread test/gtest/uct/cuda/test_cuda_ctx_cleanup.cc Outdated
Signed-off-by: Roie Danino <rdanino@nvidia.com>
@roiedanino roiedanino requested a review from yosefe May 19, 2026 09:34
@roiedanino

Copy link
Copy Markdown
Contributor Author

@yosefe @rakhmets @iyastreb
Is it good to go?

@gleon99

gleon99 commented May 27, 2026

Copy link
Copy Markdown
Contributor

@roiedanino need to cuCtxPushCurrent in restore_current_cuda_context.

@roiedanino roiedanino merged commit 73c9653 into openucx:master May 27, 2026
154 checks passed
return status;
}

if (primary_ctx != ctx_rsc->ctx) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safer to use context ID to check that contexts are the same: cuCtxGetId. The method was added in CUDA 12 to solve the problem when newly created context could have the same pointer address as the previously deleted one.

Comment on lines +64 to +67
/* Retained CUDA primary context, if @ctx is a primary context */
CUcontext primary_ctx;
/* CUDA device of @primary_ctx */
CUdevice cuda_device;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use cuda_device field as a flag that ctx field represents a primary device context.

event_mp);
int pushed;

if (uct_cuda_base_ctx_rsc_push(ctx_rsc, &pushed) != UCS_OK) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to push the context here? Can we just check that retained primary context has the same ID as one that was stored during uct_cuda_ctx_rsc_t creation?

gleon99 pushed a commit that referenced this pull request May 31, 2026
* UCT/CUDA/BASE: CUDA ctx - Fixing #11437 CR comments

Signed-off-by: Roie Danino <rdanino@nvidia.com>

* UCT/CUDA/BASE: Avoid pushing context during cleanup

Signed-off-by: Roie Danino <rdanino@nvidia.com>

---------

Signed-off-by: Roie Danino <rdanino@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants